Explainable artificial intelligence on life satisfaction, diabetes mellitus and its comorbid condition

This study uses artificial intelligence for testing (1) whether the comorbidity of diabetes and its comorbid condition is very strong in the middle-aged or old (hypothesis 1) and (2) whether major determinants of the comorbidity are similar for different pairs of diabetes and its comorbid condition (hypothesis 2). Three pairs are considered, diabetes-cancer, diabetes-heart disease and diabetes-mental disease. Data came from the Korean Longitudinal Study of Ageing (2016–2018), with 5527 participants aged 56 or more. The evaluation of the hypotheses were based on (1) whether diabetes and its comorbid condition in 2016 were top-5 determinants of the comorbidity in 2018 (hypothesis 1) and (2) whether top-10 determinants of the comorbidity in 2018 were similar for different pairs of diabetes and its comorbid condition (hypothesis 2). Based on random forest variable importance, diabetes and its comorbid condition in 2016 were top-2 determinants of the comorbidity in 2018. Top-10 determinants of the comorbidity in 2018 were the same for different pairs of diabetes and its comorbid condition: body mass index, income, age, life satisfaction—health, life satisfaction—economic, life satisfaction—overall, subjective health and children alive in 2016. In terms of SHAP values, the probability of the comorbidity is expected to decrease by 0.02–0.03 in case life satisfaction overall is included to the model. This study supports the two hypotheses, highlighting the importance of preventive measures for body mass index, socioeconomic status, life satisfaction and family support to manage diabetes and its comorbid condition.


Scientific Reports
| (2023) 13:11651 | https://doi.org/10.1038/s41598-023-36285-z www.nature.com/scientificreports/ Study of Ageing (2014-2016) data and artificial intelligence models (the artificial/recurrent neural network) for testing the hypotheses on the comorbidity of the diseases and its major determinants. This study is an extension of the framework above to comorbidity among diabetes, cancer, heart disease and mental disease in a middle-aged or old population. Moreover, we introduced the Shapley additive explanations (SHAP) values to analyze the direction of association between a major determinant and the comorbidity in the prediction model. To our best knowledge, this is one of the earliest endeavors to adopt a cutting-edge method of explainable artificial intelligence. This study is expected to have global implications, given that cancer, ischemic heart disease, depressive disorders and diabetes were top-4 causes of death or disability in the world for 2017-2018 [1][2][3][4]10 . In this context, this study tests the following hypotheses from the literature above, considering three pairs of diabetes and its comorbid condition (diabetes-cancer, diabetes-heart disease and diabetes-mental disease):

Hypothesis 1
The comorbidity of diabetes and its comorbid condition is very strong in the middle-aged or old.

Hypothesis 2
Major determinants of the comorbidity are similar for different pairs of diabetes and its comorbid condition.

Results
Descriptive statistics for participants' categorical and continuous variables are shown in Tables S1 and S2 (Supplementary Tables), respectively. Among the 5527 participants in 2018, 1532 (27.7%) were diagnosed as diabetes and/or cancer, 1621 (29.3%) as diabetes and/or heart disease, and 1461 (26.4%) as diabetes and/or mental disease. Among the same participants in 2016, 1138 (20.6%), 344 (6.2%), 553 (10.0%) and 273 (4.9%) were characterized by the diagnosis of diabetes, cancer, heart disease and mental disease, respectively. On average, the age, monthly income and body mass index of the participant were 71, $1205 and 24, respectively. According to Table 1, the random forest, the recurrent neural network and logistic regression were the best models in terms of accuracy and the area under the receiver-operating-characteristic curve. Their accuracy measures were 0.9725, 0.9688 and 0.9720 for diabetes-cancer, 0.9776, 0.9703 and 0.9776 for diabetes-heart disease and 0.9797, 0.9711 and 0.9797 for diabetes-mental health, respectively. Likewise, their areas under the receiver-operating-characteristic curves were 0.9625, 0.9630 and 0.9625 for diabetes-cancer, 0.9725, 0.9747 and 0.9725 for diabetes-heart disease, and 0.9775, 0.9779 and 0.9800 for diabetes-mental health, respectively.
Based on variable importance from the random forest (Table 2, Fig. 1), diabetes and its comorbid condition in 2016 were top-2 determinants of the comorbidity in 2018 (This supports the hypothesis 1). Top-10 determinants of the comorbidity in 2018 were the same for different pairs of diabetes and its comorbid condition: body mass     Table 3 and Figs. 2 and 3. Table 3 and Fig. 2 denote SHAP summary tables and plots, respectively. Figure 3 represents SHAP dependence plots. The SHAP value of a particular determinant for a particular observation measures a difference between what the model (the random forest) predicts for the probability of the comorbidity for the observation with and without the determinant. Indeed, the SHAP dependence plot reveals an interaction between two determinants regarding their effects on the probability prediction of the comorbidity. In Table 3 and Fig. 2, the SHAP values of body mass index (× 033) have the range of (− 0.09, 0.27), (− 0.10, 0.26) and (− 0.04, 0.27) for diabetes-cancer, diabetes-heart disease and diabetes-mental disease, respectively. There exists a strong positive association between body mass index and the comorbidity. Here, the probability of the comorbidity is expected to increase by 0.26-0.27 in case body mass index is included in the model. Based on Table 3 and Fig. 2, an association looks positive between life satisfaction overall (× 041) and the comorbidity as well. But Fig. 3a-c reveal the opposite pattern. In Fig. 3a-c, the SHAP values of life satisfaction overall (× 041) are (1) lower for those with high life satisfaction overall and (2) higher for those with diabetes (× 043) in red dots. Here, the probability of the comorbidity is expected to decrease by 0.02-0.03 in case life satisfaction overall is included in the model.

Discussion
In summary, (1) the comorbidity of diabetes and its comorbid condition is very strong in the middle-aged or old and (2) major determinants of the comorbidity are similar for different pairs of diabetes and its comorbid condition (body mass index, income, age, life satisfaction-health, life satisfaction-economic, life satisfactionoverall, subjective health and children alive). Three pairs are considered, diabetes-cancer, diabetes-heart disease and diabetes-mental disease. A few previous studies investigated these issues albeit with different dependent and independent variables from this study. One previous study reported that family support (children alive, marriage), socioeconomic status (education, income) and social activity (friendship activity) are major determinants of association among cerebrovascular disease, hearing loss and cognitive impairment in a middle-aged or old population in Korea and that association among the three diseases is very strong in the middle-aged or old 7 . Likewise, another previous study suggested that family support (brothers/sisters cohabiting, parents alive), socioeconomic status (income) and social activity (voluntary activity, family activity, leisure activity, friendship meeting) are major determinants of association among diabetes mellitus, visual impairment and hearing loss in a middle-aged or old population in the nation 9 .
To our best knowledge, this study is the first artificial-intelligence study to compare major determinants across three pairs of diabetes and its comorbid conditions. The largest cohort data in this line of research was obtained from the KLoSA (2016-2018) for 5527 subjects aged 56 or more. The random forest and the recurrent neural network registered remarkable performance in terms of the area under the receiver operating characteristic  www.nature.com/scientificreports/ curve within the range of 96.3-97.8. Furthermore, we calculated the SHAP values to identify the direction of association between a major determinant and the comorbidity in the prediction model (random forest). To our best knowledge, this is one of the earliest achievements to introduce a cutting-edge approach of explainable artificial intelligence. Specifically, three comments are available in the context of existing literature. Firstly, the results of this study are consistent with previous studies on social determinants of chronic diseases, requesting due attention to socioeconomic status (income) and family support (children alive) 7,9,11,12 . For example, a review study reports that family support is likely to reduce morbidity and mortality by improving cardiovascular, neuroendocrine and immune functions besides promoting health behavior and mental status 12 . Secondly, it was also found in this study that the comorbidity of diabetes and its comorbid condition is very strong in a middle-aged or old population. These findings suggest that preventive measures for diabetes and its comorbid condition should become central policy. Thirdly, this study sheds new light on the importance of body mass index and life satisfaction in managing diabetes and its comorbid conditions across board. Several analyses of national surveys in the United States reported a positive association between body mass index and diabetes: the Study to Help Improve Early evaluation and management of risk factors Leading to Diabetes 1994 13 16 . However, no artificial-intelligence study was available to analyze a negative association of body mass index with three pairs of diabetes and its comorbid conditions. This study is the first investigation in this direction.
In a similar vein, a negative linkage was reported between life satisfaction and diabetes or mental disease in a few studies [17][18][19][20] . Moreover, several previous studies found that life satisfaction may played as a protective factor for cancer and heart disease [21][22][23] . To our best knowledge, however, no machine-learning examination has been available on the significance of life satisfaction in managing various pairs of diabetes and its comorbid conditions. Based on the findings of this study, life satisfaction is a major protective factor against diabetes and its comorbid   www.nature.com/scientificreports/ consultation for those with diabetes and its comorbid conditions. More comprehensive effort is need in this direction and this study would be a good starting point for further research. However, this study had some limitations. Firstly, expanding the scope of this study to other chronic diseases and other determinants of comorbidity such as medication would add a great contribution to this line of research. Secondly, this study did not consider possible relationships or mediating effects among independent variables. Thirdly, sub-group analysis across age and gender (for example, men below 71, women below 71, men above 70, women above 70) would offer more insight on the major determinants of the comorbidity among the three diseases. Fourthly, the variables of life satisfaction and mental disease were measured based on single questions. This would be a good research topic to evaluate and improve their validity and reliability. Finally, different artificial intelligence methods would highlight different social determinants of chronic diseases but little study is available and more investigation is needed on this issue.
In conclusion, the comorbidity of diabetes and its comorbid condition is very strong in the middle-aged or old, and major determinants of the comorbidity are similar for different pairs of diabetes and its comorbid condition. Preventive measures for body mass index, socioeconomic status, life satisfaction and family support would be needed for the effective management of diabetes and its comorbid condition.

Methods
Participants and variables. The data source of this study was the Korean Longitudinal Study of Ageing (KLoSA) (2016-2018) 8 . This study did not require the approval of the ethics committee given that data were publicly available (https:// survey. keis. or. kr/ eng/ klosa/ klosa 01. jsp) and de-identified. Among the 6618 participants, 1091 with missing values on any of three dependent variables and thirty one independent variables were deleted (piecewise deletion). The final sample of this study consisted of 5527 subjects aged 56 or more. The purpose of the KLoSA is to build a data source for the preparation of population aging in Korea. It is a good data source for artificial intelligence, given that its size is big and its quality is high enough for the great performance of artificial intelligence. Another desirable data source for artificial intelligence would be Korea National Health Insurance Service (KNHIS) data, which is designed to provide the socioeconomic qualification and medical utilization of all citizens in Korea (https:// nhiss. nhis. or. kr/ bd/ ab/ bdaba 022eng. do). The KLoSA presents rich information on socioeconomic status and qualification of the old population, whereas the KNHIS data offers a variety of data on medical status and utilization of the entire population.
The KLoSA question on diabetes (or cancer/heart disease/mental disease) in 2016 and 2018 was "Since the last survey, have you ever been diagnosed by a doctor diabetes (or cancer/heart disease/mental disease)? 1. Yes. 5. No. " [C011 (or C016/C033/C043)]. The dependent variable, the comorbidity of diabetes and its comorbid condition in 2018, was divided into four categories: "0" for having no disease; "1" and "2" for having diabetes only and its comorbid condition only, respectively; and "3" for having both diseases. This study focuses on association among diseases as their comorbidity instead of complication. The independent variables were the following determinants in 2016 7,9 : (1) diabetes (no, yes) and its comorbid condition (no, yes); (2) demographic information, i.e., age, gender; (3) family support including children alive, brothers/sisters cohabiting, parents alive (father & mother, father, mother, none), marital status (married, separated, divorced, widowed, unmarried); (4) socioeconomic conditions such as educational level (elementary school or below, junior high school, senior high school, college or above), income (monthly, normalized between 0 and 1), health insurance (Medicare, Medicaid), economic activity (employed, unemployed); (5) social activity (monthly frequency), that is, religious, friendship, leisure, family, voluntary, political; (6) health-related information, i.e., subjective health (very good, good, middle [neither good nor poor], poor, very poor), body mass index, smoker (non, former, current), drinker (non, former, current); and (7)  Analysis. Seven popular artificial intelligence approaches were compared for the prediction of the comorbidity: logistic regression, decision tree, naïve Bayes, random forest, support vector machine, artificial neural network, and recurrent neural network 7 . Data on 5527 participants were divided into training and validation sets with a 75:25 ratio (4145 vs. 1382 observations). Accuracy, a ratio of correct predictions among 1382 observations, was introduced as a criterion for validating the models trained. Variable importance from the random forest, an accuracy (or mean-impurity) gap between a complete model and a model excluding a certain variable, was used for testing the two hypotheses of this study. The evaluation of the hypothesis 1 was based on whether diabetes and its comorbid condition in 2016 were top-5 determinants of the comorbidity in 2018. The evaluation of the hypothesis 2 was based on whether top-10 determinants of the comorbidity in 2018 were similar for different pairs of diabetes and its comorbid condition. Finally, the SHAP values were calculated to analyze the direction of association between a major determinant and the comorbidity in the model (random forest). The SHAP value of a particular determinant for a particular observation measures a difference between what the model (the random forest) predicts for the probability of the comorbidity for the observation with and without the determinant.
In practice, experts in artificial intelligence use random forest variable importance to derive the rankings and values of all predictors for the prediction of the dependent variable. Then, they employ the SHAP plots to evaluate the directions of associations between the predictors and the dependent variable. Linear or logistic regression used to play this role before the SHAP approach took it over. This is because the SHAP approach has a notable strength compared to linear or logistic regression: the former considers all realistic scenarios, unlike the latter. www.nature.com/scientificreports/ Let us assume that there are three predictors of the comorbidity, i.e., diabetes, life satisfaction overall and age as in Fig. 3. As defined above, the SHAP value of diabetes for the comorbidity for a particular participant is the difference between what machine learning predicts for the probability of the comorbidity with and without diabetes for the participant. Here, the SHAP value for the participant is the average of the following four scenarios for the participant: (1) life satisfaction overall excluded, age excluded; (2) life satisfaction overall included, age excluded; (3) life satisfaction overall excluded, age included; and (4) life satisfaction overall included, age included. In other words, the SHAP value combines the results of all possible sub-group analyses, which are ignored in linear or logistic regression with an unrealistic assumption of ceteris paribus, i.e., "all the other variables staying constant". Python 3.52 (Centrum voor Wiskunde en Informatica, Amsterdam, Netherlands) was employed for the analysis on November 2022.
Ethics approval and consent to participate. This study did not require either the approval of the ethics committee or the informed consent of human subjects given that (1) data were publicly available (https:// survey. keis. or. kr/ eng/ klosa/ klosa 01. jsp) and (2) data were de-identified (patient anonymity was preserved).

Data availability
The